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Abstract. Clustering a graph means identifying internally dense sub- 
graphs which are only sparsely interconnected. Formalizations of this 
notion lead to measures that quantify the quality of a clustering and 
to algorithms that actually find clusterings. Since, most generally, cor- 
responding optimization problems are hard, heuristic clustering algo- 
rithms are used in practice, or other approaches which are not based 
on an objective function. In this work we conduct a comprehensive ex- 
perimental evaluation of the qualitative behavior of greedy bottom-up 
heuristics driven by cut-based objectives and constrained by intraclus- 
ter density, using both real-world data and artificial instances. Our study 
documents that a greedy strategy based on local movement is superior to 
one based on merging. We further reveal that the former approach gen- 
erally outperforms alternative setups and reference algorithms from the 
literature in terms of its own objective, while a modularity-based algo- 
rithm competes surprisingly well. Finally, we exhibit which combinations 
of cut-based inter- and intracluster measures are suitable for identifying 
a hidden reference clustering in synthetic random graphs. 

1 Introduction 

Graph clustering aims at finding subsets of vertices that are densely connected 
with each other but sparsely connected with the remainder of the graph. In 
the last decades, interest in graph clustering algorithms has grown rapidly, with 
applications ranging from customer recommendation systems to the analysis 
of networks describing social ties or protein-protein interaction. A variety of 
measures have been proposed, which are used to assess and compare different 
clusterings and to guide the design of algorithms. Traditional methods from 
algorithmics often focus on sparse cuts with respect to measures like conductance 
[18j or expansion |16j . while, independent from that, a measure called modularity 
|21| proved to yield meaningful clusterings on a wide range of application data. 

Recently, we systematically assembled a range of self-evident intracluster den- 
sity and intercluster sparsity measures for clusterings, where the latter are based 
on conductance , expansion and density of the cuts induced by the clusters [H] . 
We further formally stated the problem Density- Constrained Clustering 
(DCC), where the objective is to optimize intercluster sparsity with the con- 
straint that the intracluster density must exceed a given threshold. As optimal 
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polynomial-time algorithms for DCC are unknown, we investigated how different 
combinations of intracluster sparsity and intercluster density measure influence 
the efficiency of a greedy optimization strategy based on cluster merging. How- 
ever, little is known about its qualitative behavior in practical scenarios, and an 
experimental evaluation of DCC has yet been missing. 

Our Contribution. We provide a comprehensive study of the practical be- 
havior of greedy graph clustering heuristics driven by cut-based objectives and 
constrained by intracluster density. We give evidence that, in general, greedy 
algorithms based on local vertex moves lead to better quality than the corre- 
sponding merge-based algorithm. We then compare the move-based algorithm 
to a set of reference algorithms from the literature, both with respect to the ob- 
jective of DCC and their ability to reconstruct planted partitions in a family of 
synthetic graphs. We find that the greedy move algorithm compares favorably to 
most reference algorithms in the context of DCC, while a comparison with the 
modularity-based algorithm shows that optimizing modularity implicitly yields 
good results for some variants of DCC. Experiments with planted partition 
graphs suggest that certain combinations of inter- and intracluster measures are 
effective in finding the hidden clustering, while others clearly fail. Together with 
observations about the number of identified clusters, this yields valuable insights 
about the behavior of the respective intra- and intercluster density measures. 
Related Work. Related clustering algorithms are Iterative Conductance Cut- 
ting [18], Markov-Clustering [10], Geometric MST Clustering [6] and a modularity- 
based greedy algorithm based on vertex moves [52]; we use these as reference 
algorithms. Kannan et al. propose to minimize the cut between, subject to 
a guaranteed conductance within clusters [TS], which is closely related to the 
DCC. They further show that Iterative Conductance Cutting has polylogarith- 
mic approximation guarantees on both of these measures. Brandes et al. conduct 
an experimental study on the performance of Iterative Conductance Cutting, 
Markov-Clustering and Geometric MST Clustering, both with respect to qual- 
ity and running times [7]. A similar, but more recent study can be found in [T^ . 
Flake et al. give a clustering algorithm with provable, but interdependent bounds 
on both intra- and a variant of intercluster expansion. The notion of modularity 
was introduced in |21j . an extensive and recent overview of the research on it 
can be found in |12j . Apart from these, there is a huge number of publications 
on graph clustering, for an overview see [17l4j . 

2 Preliminaries 

Notation. Let G — {V, E) be an undirected, unweighted, and simple graph, 
i.e. G is loopless and has not parallel edges. In the following, n will always 
denote the number of vertices and m the number of edges in G. For two subsets 
A and B of V, ttia.b ■= \{{'u,v} € E \ u £ A,v ^ is the number of edges 
between A and B, := \A\ is the number of vertices in A, rriA '■— |^'(^)| is its 
number of intracluster edges and xa '■= "mA.vXA the number of intercluster edges 
incident to A. Further, the volume va of A is defined as va ■= J2v€a'^'^S('^)- 
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restrict ourselves to disjoint clusters 
in this work, this means, if C = 
{Ci, . . . , Cfc} is a partition of V, we 
call C a clustering of G and the sets Ci 
clusters. The cluster containing ver- 
tex V is C(?;) and the clustering that 
results from moving vertex v to clus- 
ter D, i.e. (C \ {C{v),D}) U {C{v) \ 
v,D U {«}}, is abbreviated by Cy^o- 
A clustering is trivial if either fc = 1 
{all- clustering), or each cluster con- 
tains only one element (singletons). 
We identify a cluster C with the set 
of nodes it constitutes and with its 
vertex-induced subgraph of G. Then 
E{C) := UcgC ^i^) called intra- 
cluster edges and E \ E{C) interclus- 
ter edges. A clustering measure is a 
function that maps clusterings to real 

numbers, thereby assessing the quality of a clustering. We define high quality 
to correspond to high (low) values of intracluster (intercluster) measures and 
will always denote intracluster density measures with i and intercluster density 
measures with x, unless otherwise stated. 

Intracluster Density and Intercluster Sparsity Measures. All interclus- 
ter measures we use are based on cuts or k-way cuts. Separating a single cluster 
from the remaining vertices induces a cut, whose sparsity can be evaluated using 
density, conductance or expansion. This defines a set of sparsity values for the 
whole clustering, from which we can either compute the average or the max- 
imum, yielding maximum/average intercluster density/conductance/expansion 
(mixd, aixd, mixc, aixc, mixe and aixe|^ Another point of view is to evaluate 
the clustering as a whole, i.e. to assess the sparsity of the induced fc-way cut 
directly. We do this by either counting the number of intercluster edges (nxe) or 
by dividing the number of intercluster edges by the maximum possible number, 
i.e. the number of intercluster pairs (gxd). It is possible to use similar, cut-based 
measures for intracluster density. However, even evaluating these measures for a 
given clustering is iVP-hard, such that clustering algorithms usually work with 
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approximations or bounds |18|ll)7j . As we intend to use intracluster density 
measures as constraints in greedy bottom-up algorithms, it is crucial to be able 
to evaluate them efficiently. We therefore use a more practical approach and de- 
fine intracluster density as the ratio of the number of intracluster edges and the 
number of intracluster pairs. Evaluating this globally leads to global intracluster 
density (gid), whereas the average and minimum of all clusters yields average 
and minimum intracluster density (aid and mid). 

Table [T] summarizes the formalizations of all measures considered. Note that, 
in contrast to the set of measures used in fl4' , we omit the notions of pairwise 
densities as they turned out to be very prone to local minima if used with greedy 
bottom-up algorithms. Although it does not quite fit into this classification. 
Table [T] also includes the objective used by one of the reference algorithms, 
modularity, which simultaneously assesses intracluster density and intercluster 
sparsity by subtracting from the fraction of intracluster edges the expectation 
of this value in a random graph (high modularity corresponds to high quality) . 
Density-Constrained Clustering. Density-Constrained Clustering is the prob- 
lem of optimizing intercluster density while retaining guarantees on the intra- 
cluster density. Considering each combination of intracluster and intercluster 
measure listed in Table [T] leads to a family of optimization problems. Slightly 
abusing the notation, we consider modularity as an intercluster density objective 
in this context. 

Problem 1 ('Density-Constrained Clustering(DCC) ). Given a graph G = 
{V, E), among all clusterings with an intracluster density of no less than a, find 
a clustering C with optimum intercluster quality. 

3 Greedy Algorithms for Density-Constrained Clustering 

The following generic greedy algorithms heuristically minimize(maximize) the 
objective function of DCC for all density measures considered. 
Greedy Merge (GM). Starting from singletons, the algorithm greedily merges 
pairs of clusters. In each step, among all pairs of clusters whose merge does not 
violate the constraint on the intracluster density, the merge with the largest 
benefit to the intercluster density is performed. We recently proposed this algo- 
rithm in the context of DCC [TJ and classified combinations of intercluster and 
intracluster density with respect to the question how efficiently this algorithm 
can be implemented. Algorithms of these kind are common in the context of 
clustering point sets in c?-dimensional space, where a basic constraint is that the 
number of clusters must not fall below a certain threshold. In the field of graph 
clustering, this algorithm is used to optimize modularity [1]. 
Greedy Vertex Moving (GVM). The key ingredient of GVM (Algo. [l]) is 
a subprocedure that tries to greedily improve the objective function by letting 
vertices move to neighboring clusters (Algo. [2|. This subprocedure repeatedly 
iterates through the vertex set and, for each vertex, performs the most improv- 
ing move (subject to the constraint), potentially isolating a vertex, or leaving it 
where it was, until a local optimum is reached. Starting with singletons, GVM 



Algo. 1: Greedy Vertex Moving 


Algo. 2: Local Moving (LM) 


Input : graph G, inter, intra, a 
Output: clustering Co of G 
G° ^G,h^O 
repeat 

C'' ^ Singletons(G'') 

C'' ^ LM(G'',C'', intra, inter, a) 

G''+^ ^ contract(G'',C'') 

h-ir- h+1 
until no more real contractions 
while h> do 
h~l 

^ project(C''+\G'*) 
C'' LM(G'',C'\ inter, intra, a) 
end 

return C° 


Input : graph G, clustering Cinit of 

G, inter, intra, a 
Output: clustering C of G 

C ^ Cinit 

repeat 

forall the v £V do 

{C eC\ intra(C„^c) > a 
and \E{v,G)\ > 0} 

TV arg min |inter(Ci,^c)l 

ceAu{) 

if inter(C„^jv) < inter(C) then 

move(«, A'") 
end 
end 

until no more changes 
return C 



first calls this subprocedure and contracts the resulting preliminary clustering 
into a super-graph, i.e. each cluster becomes a vertex weighted with the number 
of vertices it represents, and edges are summarized such that edge weights reflect 
the number of edges in the original graph. This whole process is iterated until 
local moving does not yield any further improvement, and results in a hierarchy 
of graphs with increasing coarseness. In the second phase (refinement), the hi- 
erarchy is unfurled step by step by projecting the clustering of the i + 1-th level 
of the hierarchy to level i, i.e. the clusters in level i are merged according to the 
clustering in level i + 1. After each step, LM is called again on the current level 
of the hierarchy to potentially improve the objective function further, until a 
clustering for the finest level, i.e. the original graph, is obtained. 

GVM is closely related to algorithms in the context of graph partitioning 
and has previously been used for modularity-based clustering without constraints 
|5I22| . Neither approximation guarantees nor subexponential bounds on the run- 
ning time are known, but experimentally it has been shown to outperform the 
corresponding greedy merge algorithm with respect to both quality and effi- 
ciency. For modularity, it can easily be shown that moving a vertex to a cluster 
it is not linked with is never the best choice, therefore it suffices to consider 
neighboring clusters. Together with the observation that the change in modu- 
larity can be determined in constant time for each move if some information 
about the clustering is maintained, this yields a running time in 0{m) for each 
round in LM. This latter observation on running time also holds for all intraclus- 
ter density and intercluster sparsity measures except for mixd, mixc and mixe, 
whose values are expensive to maintain. 

Ensuring Strict Improvements. Another issue with a direct application of 
GVM to maximum-based measures is that iteratively traversing the whole vertex 
set is inefficient if only very few vertex moves potentially decrease the cut of the 



cluster with the currently worst value. Even worse, if this cluster is not unique, it 
is likely that the search is stuck in a local minimum, as vertex moves generally can 
only improve the value for one of these cluster, not for all of them simultaneously. 
If we try to prevent this by allowing vertex moves that are not strictly improving, 
we somehow have to ensure that the algorithm terminates after a finite number 
of operations. We do this in a similar way as proposed in [M] for GM by greedily 
optimizing the lexicographical order of the intercluster sparsity values of all the 
clusters. Let L{C) (/(Ci), . . . , f{Ck)),Ci G C, be the sequence of these values 
with decreasing intercluster density, i.e. {f{Ci) > /(C^+i) for i e {1, . . . , fc — 1}. 
Then a clustering C is L-better than C if L{C) is lexicographically less than L{C'). 
We now determine for each vertex the set of clusterings that can be reached by 
moving it. If one of these clusterings is L-better than the current clustering, the 
move that results in the L-best sequence is performed. As we strictly improve 
the lexicographical order in each step, termination is guaranteed. This means, 
we greedily optimize the maximum value but are also allowed to improve the 
intercluster sparsity of clusters more locally, yielding better efficiency and the 
possibility to escape local minima. 

Determining the Best Move in 0(deg(w)) Time. It holds that any two 
clusterings resulting from leaving vertex v untouched or from moving u to a 
different (or new) cluster can be L-compared in constant time (see App. [A| ). 
Furthermore, it is immediate that moving a vertex to a cluster it is not linked 
to can never decrease the number of intercluster edges (nxe). This does not 
hold for gxd, however, it is not hard to see that GVM never has to consider 
non-neighboring clusters for gxd (see App. [A|. For all other intercluster density 
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Fig. 1: Qualitative comparison of GVM and GM. 



measures this does not hold as can be seen in the examples in Fig. |4]in App. |Xj 
As configurations like these are only expected in degenerate cases, the impact on 
efficiency is large on sparse graphs, and unconnected clusters are not desirable 
in the context of graph clustering, we chose to restrict the set of feasible moves 
to neighboring clusters. Together with the possibility to compare different moves 



in constant time, we get a time complexity of 0{m) for each round of the local 
move procedure for each of the combinations considered. 

4 Experiments 

Qualitative Comparison of Greedy Merge and Greedy Vertex Moving. 

Our first experiments address the question which flavor of greedy algorithm is 
better suited for DCC. As test instances, we used all graphs listed in Table[2]with 
less than 1000 vertices, these are real-world networks taken from the websites of 
Mark Newman and Alex Arenas [B] and are part of the clustering testbed of 
the 10th DIMACS Implementation Challenge pj. For all proposed combinations 
of measures, Figure[l]shows the ratio of the intercluster density obtained by using 
GVM and GM, averaged over all graphs. For modularity, this ratio is always 
greater than one, confirming that local moving yields better results, regardless 
of the choice and strength of the constraint. In combination with gid and mid, 
this similarly holds for all other objectives except for nxe, note that, in contrast 
to modularity, we aim to minimize these measures and therefore a value below 
one means that GVM attains better results. For nxe, the outcome depends on 
the value of a chosen. In combination with aid, the outcome is less clear, the 
results for nxe are out of bounds as the ratio for some configurations exceeds 300 
percents. This can be explained by the observation that aid happily allows (and 
thereby encourages) unbalanced clusterings, as bad intracluster density values 
of large clusters can easily be compensated by a set of small and dense clusters, 
and GM is known to have a tendency to produce unbalanced partitions. As 
this most often leads to unintuitive clusterings, we deem aid less suitable in the 
context of graph clustering. Disregarding aid for these reasons, in a vast majority 
of configurations, GVM outperforms GM. For tackling DCC, we thus solely use 
GVM, putting aside the algorithm based on greedy merging. 
Effectiveness of Different Objective Functions. The next question we pose 
is, if each of the intercluster density measures is effective in optimizing itself 
when used as inter in GVM. To answer this question, we conducted the following 
experiment on the set of graphs listed in Table [21 In the following, let GVMi^a,x 
denote GVM incorporating the constraint i{Cj > a and the objective x{C). 
For each setup of DCC, i.e. intracluster measure i, intercluster measure x and 
a £ {0.0, 0.1, . . . , 1.0}, we ranked the clusterings obtained by GYMi^a,y by their 
performance with respect to x, using all possible objectives y for GVM. Figure 
[2] shows the distribution of these ranks over all configurations involving gid, 
grouped by x. The outcome of this experiment is less clear than what might 
be expected — none of the intercluster measures, not even modularity, scores the 
best quality with respect to itself in all configurations. Nonetheless, in general, 
except for nxe which is clearly dominated by gxd, each objective optimizes itself 
quite well. This also holds for mid, while for aid, the outcome is even less clear, 
as can be seen in Figures [Sj [6] in App. [Bj 

Reference Algorithms. For a more comprehensive assessment of GVM as a 
means to address DCC, we use the following reference algorithms: 
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Table 2: List of the real world test instances ordered by increasing number of vertices. 
These are taken from the webpages of Arenas(A) jSj and Newman(N) [20] and are often 
used to compare clustering algorithms. All graphs are part of the clustering testbed of 
the 10th DIM ACS Implementation Challenge [1]. 



— Iterative Conductance Cutting (ICC) |18j : This top-down algorithm itera- 
tively splits the input graph into two subgraphs based on a cut with low 
conductance. The process stops when the conductance of the cut exceeds a 
given threshold, which we set to 0.4 in our experiments. 

— Markov- Clustering (MCL) |10j : Emulating a random walk, the matrix of 
transition probabilities is alternately taken to to the power of e and renor- 
malized after taking each entry to the power of r, where e and r are input 
parameters. In our experiments, we set r and e to 2. 

— Geometric MST Clustering (CMC) First, a spectral embedding of the 
graph in d-dimensional space is built. Then the algorithm constructs a Eu- 
clidean minimum spanning tree and successively deletes the heaviest edge. 
This defines a sequence of forests whose connected components induce a set 
of clusterings. Among these clusterings, the one with the best value according 
to some given objective function is chosen. 

— Multi-Level Modularity (MOD) 22 : This is the GVM-algorithm based solely 
on modularity without using any constraint. This algorithm has been shown 
to perform very well in the context of Modularity optimization |22j . 

Comparison Based on Intracluster Density Found by Reference Algo- 
rithms. ICC, MCL and MOD do not incorporate constraints on the intracluster 
density of the resulting clustering. Nonetheless, it is still possible to evaluate 
them with respect to those variants of DCC, where a is set to the intracluster 
density found by these algorithms. In other words, given the same constraint a 
reference algorithm A implicitly adheres to, how well does GVM compare to A 
wrt. DCC? 

We first ran ICC, MCL and MOD on all test instances in Table[2]and recorded 
the intracluster density values of the resulting clusterings. Then, for each ref- 
erence algorithm A^ i, recorded corresponding intracluster density a and a;, we 
compare the clustering obtained by GYMi^a,x to the clustering of A with respect 
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Fig. 2: Ranks for different intercluster density measures as objectives in the GVM- 
algorithm using gid as constraint, evaluated by the intercluster density of the resulting 
clustering. 



to X. For GMC the experiments slightly differ as GMC requires an objective 
function. We filled this degree of freedom by choosing /(C) = i{C) — x{C) as the 
objective function for the experiments using i as intracluster and x as interclus- 
ter density measure. This seemed to be the fairest way of comparison and in 
almost all cases led to non-trivial clusterings. 

Table [3] and Table [4] show the percentage of graphs where the greedy al- 
gorithm for X compares favorably and the arithmetic mean of the ratio of x 
obtained with GVM and with the reference algorithm. As we aim to minimize 
intercluster density, a value below 1 indicates that the greedy algorithm succeeds 
in beating the reference algorithm and vice versa. Compared to ICC and MCL, 
GVM clearly yields better results. The same holds for GMC, except if used in 
combination with aid, where GMC sometimes produces far better results. This 
can be explained by the fact that aid does not punish unbalancedness and GMC 



naturally leads to very unbalanced clusterings in most instances. The outcome 
of the comparison with the modularity-based algorithm is less clear. For aid, 
GVM performs better, which is not surprising as modularity strongly discour- 
ages unbalanced clusterings. For mid, GVM still beats MOD in the majority of 
configurations, while for gid, this only holds for slightly less than half of the con- 
figurations. Furthermore, it is worth mentioning that especially for aixd and aixe 
there are instances where modularity minimizes these functions far better than 
the respective greedy algorithms. Altogether, the comparison with ICC, MCL 
and GMC suggests that GVM effectively addresses DCC, while the comparison 
with MOD shows that optimizing modularity is similarly effective in minimizing 
cut-based intercluster sparsity measures. 

Recovering Planted Partitions. To compare the different objective functions 

qualitatively, wc evaluated how well the corresponding GVM-algorithms arc able 
to reconstruct planted partitions in random graphs. As a comparison, we also give 









gid 








mid 








aid 






ICC 


iVICL 


MOD 


GMC 


ICC 


MCL 


MOD 


GMC 


ICC 


MCL 


MOD 


GMC 


nxe 


84 


95 


16 


63 


89 


95 


63 


74 


95 


100 


100 


63 


gxd 


84 


100 


42 


100 


95 


100 


84 


100 


95 


100 


100 


84 


aixd 


84 


100 


42 


100 


89 


100 


37 


95 


95 


100 


100 


84 


aixc 


84 


100 


21 


53 


95 


100 


79 


42 


95 


95 


100 


63 


aixe 


84 


95 


42 


89 


89 


95 


42 


95 


95 


95 


95 


95 


mixd 


84 


95 


53 


84 


89 


100 


74 


89 


89 


95 


89 


74 


mixc 


89 


95 


42 


37 


89 


95 


63 


37 


89 


95 


84 


21 


mixe 


89 


95 


58 


89 


84 


95 


47 


79 


95 


95 


89 


63 



Tabic 3: Comparison of GVM and reference algorithms. Entries represents the percent- 
age of graphs GVM compares favorably. 
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Tabic 4: Comparison of GVM and reference algorithms. Entries represent the moan ra- 
tio of the respective intercluster measure x obtained by GVM and reference algorithm. 



the results obtained by MOD. Due to higher running times and large numbers 
of experiments, we omit a comparison with ICC, MCL and CMC. 
Random Graphs Generated. We use an adapted Erdos-Renyi-model, where, 
starting from a given reference partition, the probability that vertices in the same 
set (in different sets) are connected equals Pm (pout)- The number of vertices (n) 
and clusters {k) as well as the skewness of the distribution of cluster sizes (/?) 
of the planted partition are input parameters. Setting /3 to 1.0 corresponds to 
uniform cluster sizes, values below and above 1 cause this distribution to be 
skewed, for more details see [TS]. As configurations, we fixed n to 10000 and 
chose Pin and Pout such that the average number of intracluster (intercluster) 
edges a vertex is incident to equals 5 (3). To determine the reference partition, 
we used all combinations of k € {10, 100, 300} and /3 G {0.3, 1.0, 2.0}. For each 
configuration, we generated 100 instances and always averaged obtained values. 

Distance Measures. To compare the clusterings obtained with the different al- 
gorithms with the reference clustering, we use the following graph-based distance 
measures taken from [9]: 

— Graph-based Rand Index (Rg)'- Let Ci and C2 be clusterings and en (eoo) 
the number of edges which are intracluster (intercluster) wrt. both Ci and 
C2. Then, Rg[Ci,C2) = 1 - (en -I- eoo)/™. 

— Editing Set Difference (ESD): For a clustering C, its editing set Fc is the 
set of edges requiring insertion or removal such that the clusters in C form 
disjoint cliques. Then, for clusterings Ci and C2, their editing set difference 
is defined as ESD{Ci,C-2) = 1 - \Fc, n FcM\Fc, U ^cj- 

Parameters and Evaluation. As an exhaustive parameter search for all con- 
figurations would be far too expensive, we always set a to 75 percent of the 
expected global intracluster density pin. We deemed taking the actual value of 
Pin too strict, as, especially for mid, even the reference clustering of the generator 
most likely does not meet this constraint. The previous experiments indicate that 
there are configurations where particular objective functions used in GVM do 
not score the best results with respect to themselves. As our goal is to compare 
good clusterings with respect to different combinations of i and a;, independent 
of artifacts of GVM, we chose the following approach: For a combination z, a, 
X, we evaluated the clustering that, among all results obtained with GVM using 
i > a as constraint, is best with respect to x (as opposed to simply evaluating 
GVMi c 2,). Furthermore, preliminary experiments confirmed that constraining 
aid leads to very unintuitive and unbalanced clusterings, which is mirrored by 
the fact that the corresponding versions of DCC are far less effective in finding 
the hidden clustering. We hence excluded aid in the discussion of the results. 
Results on Planted Partition Graphs. Figure[3]shows the results for selected 
configurations, the results for the whole set of experiments can be found in 
App.jCj In the first plot it can be seen that, in general, the clusterings that are 
ranked best with respect to mod, nxe and gxd are most similar to the reference. 

Constraining modularity by mid improves its results. This especially holds for 
the experiments with high skewness (/3 = 2) and k — 300. In these experiments, 
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Fig. 3: Distance to reference clustering (boxplots, left-hand y-axis) and number of clus- 
ters discovered in planted partition graphs (green x -marks, right-hand y-axis), different 
configurations 



modularity finds far less clusters than expected, partially due to its known res- 
olution limit [13 , which can be circumvented by steering the coarseness of the 
clustering by constraining the intracluster density. Another interesting fact is 
that ESD punishes these coarse clusterings far more than Rg. 

Fine reference clusterings disbalance maximum objectives. Compared to the 
above, especially mixc in combination with gid yields worse similarity values. 
This, and the slightly increased cluster count can be explained by a tendency of 
mixc to favor unbalanced clusterings if the expected number of clusters is high 
(fc = 300), which also explains why this effect does not happen in combination 
with mid that does not allow very unbalanced clusterings. To a smaller extent, 
the same observation also holds for the other maximum measures, as can be seen 
for k = 300 and /3 = 1.0. 

aixe and especially aixd identify many clusters. Another striking observation 
is that the average number of clusters in clusterings found by aixd and aixe, 
indicated by the green x -marks, is much higher than the average number of 
clusters in the reference. This especially stems from the experiments with few 
clusters. In the configuration with (3 = 1 and k = 10, it can also be seen that 
these measures differ the more, the coarser the expected clustering gets. This is 
not unexpected, as the denominator of aixd grows more slowly with the number 
of vertices in the cluster than the denominator of aixe, meaning that aixd is less 
eager to produce very large clusters. Additionally, in jl4j it was proven that 
with the exception of aixd, all intercluster measures considered here can always 
be ameliorated by merging two existing clusters (unboundedness) , which is also a 
hint that aixd is less likely to produce coarse clusterings than the other measures. 

Implementation and Running Times. The algorithms ICC, MCL, GMC 
and GM are implemented in Java 1.6.0_22 using the graph library yFiles [23] . 
GVM (also incorporating MOD as a special case) is implemented in C-I--I- using 
version 1_42 of the Boost Graph Library [5] and compiled with gcc 4.5.2 with 
optimization level 4. The focus of this evaluation is on the quality of the resulting 
clusterings, not on running times. However, to get a rough impression about the 
latter, clustering cond-mat-2005 on a 2.1 GHz AMD Opteron processor takes 
about 6 hours with ICC, 1 hour and 50 minutes with MCL, 5 minutes with GMC 
and 3 to 15 seconds with GVM, depending on the parameter setting. With our 
prototype implementation (not including the improvements proposed in |14j ) of 
GM, clustering the much smaller celegans_metabolic takes over 2 minutes. 

5 Conclusion 

This work is an experimental evaluation of algorithms for the optimization prob- 
lem Density-Constrained Clustering (DCC). We first evaluated two greedy 
heuristics, vertex moving and cluster merging, against each other and against 
algorithms from the literature. Vertex moving proved reliably superior to clus- 
ter merging and, in many cases, beats the results of the reference algorithms. 
Our results also show that a well-known modularity-based algorithm implicitly 



addresses DCC quite well, revealing similarities between cut-based intercluster 
sparsity measures and modularity. In the second part, we addressed the question 
whether different combinations of intracluster density and intercluster sparsity 
measures are suitable to guide algorithms in recovering planted partitions in 
random graphs. The results suggest that minimizing the average intercluster ex- 
pansion or density of the clusters overestimates the number of clusters if the 
expected clustering is coarse, while the maximum intercluster measures lead to 
unbalanced clusters if the expected clustering is fine and the constraint on the 
intracluster density does not force the clustering to be balanced. Additionally, it 
can be seen that the known resolution limit for modularity can be circumvented 
if the coarseness of the clustering is controlled by an additional constraint on 
the intracluster density of the clustering. 
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A Additional Examples and Explanations 



Metximum Functions: Clusterings resulting from vertex moves can be 
Z/-compared in constant time. For three distinct clusters C, A and B in C 
and V G C it holds that: 

— Cv^A is L-better than C {C \ {v}, A U {w}} is L-better than {C, A} 

— Cv^A is L-better than Cy^B <S4> U {v}, i?} is L-better than {S U {v}, A\ 

If the vohinic\ size and number of out-going edges of the clusters A, B 
and C are maintained by the algorithm, the density /conductance/expansion of 
C, j4, _B, C\{v}, A\J{v} and B\j{v} can be determined in constant time. Hence, 
the conditions on the right-hand side can be evaluated in constant time, which 
can be used to determine the best move for a vertex efficiently. 

Connectedness of gxd. The following equation shows that GVM never has 
to consider non-neighboring clusters for gxd, as isolating the respective vertex 
is always more beneficial. Let v € V , A := C{v) \ {v} and B & C such that 
'tn{v},B = 0> then: 



gxd(C„^.{}) 



Ec.,C,,J>^\C^\\CJ\ + \A\ 

>0 




(a) mixd, mixe (b) mixc (c) aixc, aixe, aixd 



Fig. 4: Examples illustrating that most measures considered do not enforce connected 
moves. Given the clusterings indicated by the gray areas, among all moves involving 
V, moving v to cluster Ci yields the largest decrease in the objective function. 



B Effectiveness of Different Objective Functions: 
Additional Plots 
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Fig. 5: Ranks for different intercluster density measures as objectives in the GVM- 
algorithm using mid as constraint, evaluated by the intercluster density of the resulting 
clustering. 
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Fig. 6: Ranks for difTerent intercluster density measures as objectives in the GVM- 
algorithm using aid as constraint, evaluated by the intercluster density of the resulting 
clustering. 



C Complete Experiments with Planted Partition Graphs 
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